Is Multinomial PCA Multi-faceted Clustering or Dimensionality Reduction?
نویسندگان
چکیده
Discrete analogues to Principal Components Analysis (PCA) are intended to handle discrete or positive-only data, for instance sets of documents. The class of methods is appropriately called multinomial PCA because it replaces the Gaussian in the probabilistic formulation of PCA with a multinomial. Experiments to date, however, have been on small data sets, for instance, from early information retrieval collections. This paper demonstrates the method on two large data sets and considers two extremes of behaviour: (1) dimensionality reduction where the feature set (i.e., bag of words) is considerably reduced, and (2) multi-faceted clustering (or aspect modelling) where clustering is done but items can now belong in several clusters at once.
منابع مشابه
Efficient Optimization for Discriminative Latent Class Models
Dimensionality reduction is commonly used in the setting of multi-label supervised classification to control the learning capacity and to provide a meaningful representation of the data. We introduce a simple forward probabilistic model which is a multinomial extension of reduced rank regression, and show that this model provides a probabilistic interpretation of discriminative clustering metho...
متن کاملEigenanatomy: sparse dimensionality reduction for multi-modal medical image analysis.
Rigorous statistical analysis of multimodal imaging datasets is challenging. Mass-univariate methods for extracting correlations between image voxels and outcome measurements are not ideal for multimodal datasets, as they do not account for interactions between the different modalities. The extremely high dimensionality of medical images necessitates dimensionality reduction, such as principal ...
متن کاملThe Robust Clustering with Reduction Dimension
A clustering is process to identify a homogeneous groups of object called as cluster. Clustering is one interesting topic on data mining. A group or class behaves similarly characteristics. This paper discusses a robust clustering process for data images with two reduction dimension approaches; i.e. the two dimensional principal component analysis (2DPCA) and principal component analysis (PCA)....
متن کاملTreelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data
In many modern data mining applications, such as analysis of gene expression or worddocument data sets, the data is highdimensional with hundreds or even thousands of variables, unstructured with no specific order of the original variables, and noisy. Despite the high dimensionality, the data is typically redundant with underlying structures that can be represented by only a few features. In su...
متن کاملMulti-Dimensional Features Reduction of Consistency Subset Evaluator on Unsupervised Expectation Maximization Classifier for Imaging Surveillance Application
This paper presents the application of multi dimensional feature reduction of Consistency Subset Evaluator (CSE) and Principal Component Analysis (PCA) and Unsupervised Expectation Maximization (UEM) classifier for imaging surveillance system. Recently, research in image processing has raised much interest in the security surveillance systems community. Weapon detection is one of the greatest c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003